147 research outputs found
Best of Both Worlds – Relational Databases and Statistics
Statistics software packages and relational database systems possess
considerable overlap in the area of data loading, handling, and
transformation. However, only databases are mainly optimized
towards high performance in this area. In this paper, we present
our approach on bringing the best of these two worlds together.
We integrate the analytics-optimized database MonetDB and the R
environment for statistical computing in a non-obtrusive, transparent
and compatible way
Don't hold my data hostage - A case for client protocol redesign
Transferring a large amount of data from a database to a
client program is a surprisingly expensive operation. The
time this requires can easily dominate the query execution
time for large result sets. This represents a significant hurdle
for external data analysis, for example when using statistical
software. In this paper, we explore and analyse the result set
serialization design space. We present experimental results
from a large chunk of the database market and show the
inefficiencies of current approaches. We then propose a
columnar serialization method that improves transmission
performance by an order of magnitude
Data Management for Data Science - Towards Embedded Analytics
The rise of Data Science has caused an influx of new usersin need of data management solutions. However, insteadof utilizing existing RDBMS solutions they are opting touse a stack of independent solutions for data storage andprocessing glued together by scripting languages. This is notbecause they do not need the functionality that an integratedRDBMS provides, but rather because existing RDBMS im-plementations do not cater to their use case. To solve theseissues, we propose a new class of data management systems:embedded analytical systems. These systems are tightlyintegrated with analytical tools, and provide fast and effi-cient access to the data stored within them. In this work,we describe the unique challenges and opportunities w.r.tworkloads, resilience and cooperation that are faced by thisnew class of systems and the steps we have taken towardsaddressing them in the DuckDB system
Relational queries with a tensor processing unit
Tensor Processing Units are specialized hardware devices built to train and apply Machine Learning models at high speed through high-bandwidth memory and massive instruction parallelism. In this short paper, we investigate how relational operations can be translated to those devices. We present mapping of relational operators to TPU-supported TensorFlow operations and experimental results comparing with GPU and CPU implementations. Results show that while raw speeds are enticing, TPUs are unlikely to improve relational query processing for now due to a variety of issues
- …